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Abstract 

F<: is a highly expressive typed A,-calculus with subtyping. This paper describes an 
implementation of F<: (extended with recursive types), and documents the algorithms 
used. Using this implementation, one can test ,F<: programs and examine typing deriva- 
tions. 

To facilitate the writing of complex F<: encodings, we provide a flexible syntax-ex- 
tension mechanism. New syntax can be defined from scratch, and the existing syntax can 
be extended on the fly. It is possible to introduce new binding constructs, while avoiding 
problems with variable capture. 

To reduce the syntactic clutter, we provide a practical type inference mechanism that 
is applicable to any explicitly typed polymorphic language. Syntax extension and type in- 
ference interact in useful ways. 
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1. Introduction 



F<; is a typed A,-calculus with subtyping. It is intended to capture the essence of subtyping and, to 
some extent, of object-oriented programming [Cardelli, et al. 1991; Curien, Ghelli 1991]. The F<: calculus was 
designed to be as small as possible, so that it could be studied formally. Its small size also happens to facili- 
tate implementation; during its construction it was possible to explore some advanced techniques that 
should be useful for larger languages. 

This paper describes the F-sub program, which is an implementation of F<: . (We assume a superficial 
familiarity with the latter.) Using this program, one can typecheck and evaluate F<: expressions and defini- 
tions, and examine typing and subtyping derivations. In order to keep the critical typing code clean and cor- 
rect, the implementation is very minimal and supports only the basic constructs of F<:. This minimality, 
while having some pragmatic disadvantages, allows us to describe the fundamental algorithms in full detail 
in terms of an operational semantics that is faithful to the actual program code. 

The operational semantics is described in the Appendix in layers of increasing complexity, the final 
layer corresponding closely to actual program code. The first layer corresponds to the typechecking algo- 
rithm for pure F<:. Then, other features are added: (a) de Bruijn indices, (b) partial type inference for sec- 
ond-order types, and (c) a new technique for integrating recursive types with second-order polymorphic 
types. 

Apart from the typing algorithms, another aspect of the implementations should be of general interest. 
The extensible syntax mechanism we have implemented should be useful in other mechanized formal sys- 
tems that need to define mathematical notation on the fly, such as theorem provers, proof checkers, and 
symbolic algebra systems. In these systems, one wishes to minimize the number of constructs in order to 
keep the difficult core algorithms clean and manageable. In the case of F-sub, we wish to keep the typing 
code simple by not providing basic data structures and control structures, requiring instead that they be en- 
coded as A,-terms. The drawback of this approach is that after a few levels of encoding even simple pro- 
grams become quite unreadable. To improve readability of the encodings, the F-sub system supports a very 
flexible syntax-extension mechanism based on an LL(1) parser. One can define entirely new grammars, or 
enrich the existing F-sub grammar. In particular, one can define new binding constructs and their associated 
meaning, while avoiding problems with variable capture. 

The F-sub system consists of about 10,000 lines of Modula-3 code [Nelson 1991], equally partitioned 
between a reusable parsing package and F-sub proper. The implementation is portable to any computer 
running Modula-3, that is to almost any computer running a standard C compiler [Kalsow, Muller FTP]. 
Program sources and binaries for standard architectures are freely available [Cardelli FTP] . 

2. Overview 

The syntax of F-sub types and terms is given below, informally. As a general convention, term-related 
names begin with a lower-case letter, while type -related names begin with an upper-case letter. 

A, B : := types 

X type variables 

Top the biggest type 

A->B function spaces 
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All (X<:A) B 

{A} 



bounded quantification 



grouping 



a, b 



terms 



x 



term variables 



b(:A) 

{a} 



top 

fun (x: A) b 
b(a) 

fun (X<:A)b 



the canonical member of Top 

functions 

applications 

polymorphic functions 

type applications 



grouping 



When loaded, the F-sub system displays its ' prompt, at which one can write a term like 'top', 
followed by a semicolon. The system answers by inferring the type of the term and evaluating it. The an- 
swers given by the system are indicated by 'to 

- top; 

to top : Top 

In general, at the ' - ' prompt one can write a phrase, always terminated by a semicolon. There are 
several kinds of phrases. The one above is a term phrase, while the one shown below is a type phrase; this 
is always preceded by a colon and causes the evaluation of a type: 

- :Top; 

t> ; Top 

Type definition phrases, introduced by 'Let' and term definition phrases, introduced by 'let', allow 
one to bind types and terms to variables: 

- Let Id = All (X) X->X 
to Let Id <: Top = <Td> 

- :Id; 

to ; <Td> 

- let id : Id = fun (X) fun(x:X) x; 
to let id : <Id> = <id> 

- id; 

e> <id> : <Id> 

The system produces some answers in angle brackets, as an abbreviation, to avoid printing excessive de- 
tails. If a term or a type has been given a name in a definition, then that term or type is printed as its given 
name in angle brackets. This printing heuristic has no effect on typing or evaluation. 

Once a function like 'id' is defined, it can be applied to types and terms. A type application has the 
form 'a ( : A) ' (note the ' : '); a term application has the form ' a (b) '. 

- id ( : Id) ; 

to { fun (x : <Id>) x} : {<Id>-><Td>} 

- id(:Id) (id) ; 
to <id> : <Id> 
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- id ( : Id->Id) ; 

t> {fun (x:<Id>-><Id>) x} : { { <Id>-XId> } -> { <Id>-><Id> } } 
The evaluator does not perform reductions inside functions: 

- fun(x:Id) id(:Id) (x) ; 

t> {fun (x:<Id>) <id> ( :<Id>) (x) } : {<Id>-><Id>} 

As you may notice from the printed output, curly brackets, instead of parentheses, are used to group 
syntax: 

- { fun (x: Top) x} (top) ; 
t=> top ; Top 

Programs can be stored in files. For example we can prepare a file called 'Test . f sub' containing the 
Church encoding of booleans: 

Let Bool = All(X) X->X->X; 

let true: Bool = fun (X) fun(x:X) fun(y:X) x 
false: Bool = fun (X) fun(x:X) fun(y:X) y; 

We can then load this file into the system by a load phrase: 

- load Test; 

According to the encoding of booleans above, a conditional of the form 'if x then false else 
true end' is written as: 

x(:Bool) (false) (true) 

It is possible, however, to define a more familiar syntax for conditionals by a syntax extension, as fol- 
lows. 

A syntax phrase introduces a new grammar or, in this example, modifies the existing one: 

- syntax 

termBase : : = ... 

[ "if" term_l "then" term_2 "else" term_3 

"giving" type_4 "end" ] 
=> _1 (:_4) (_2) (_3) ; 

To understand this example, one must first know that 'termBase', 'term', and 'type' are some of 
the syntactic categories of F-sub given in Appendix C (a 'termBase' is a 'term' except for the right-re- 
cursive syntax of applications). Here we wish to modify the syntax of 'termBase' by taking its existing 
definition (indicated by ' . . . ') and adding conditional expressions. By this mechanism we truly modify the 
recursive definition of terms; meaning that conditional expressions can be nested. 

The grammar of conditionals is given above as a sequence (in square brackets) of keywords and num- 
bered 'term' and 'type' grammars. The numbers are used in the action part of the grammar (following 
'=>'), where the relevant pieces of the input are reassembled into the encoding of conditionals shown ear- 
lier. 

With the extended grammar we can write, for example: 
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- let not = 

fun (x : Bool ) 

if x then false else true giving Bool end; 
t> let not : { <Bool>-><Bool> } = <not> 

As another example of syntax extensions, we can define let terms (as opposed to the top-level-only 
'let' definitions), translating them into functions and applications: 

- syntax 

termBase : : = ... 

[ "let" termlde_l ":" type_2 "=" term_3 

"in" term_4 "end" ] 
=> {fun(_l:_2)_4} (_3) ; 

In this example we are creating a new binding construct. This is reflected by the use of 'fun (_1 : ' in 
the action part. Here '_1' refers to a 'termlde', which is the F-sub grammar for a term identifier. Note 
that '_4' is inside the scope of '_1', producing the expected variable capture. (Unwanted variable captures 
are carefully avoided.) To try this out, we need to wrap the let-expressions in brackets to avoid confusion 
with let-phrases: 

- {let x: Bool = true in not (x) end}; 
<=> <false> : <Bool> 

In general, a term action (preceded by '=>') can be any F-sub term, possibly containing pattern vari- 
ables '_n\ Similarly, a type-action (preceded by ' : >') can be any F-sub type, possibly containing pattern 
variables. An action can be appended to any piece of grammar. The pattern variables '_n' can similarly be 
appended to any piece of grammar, using parentheses '(',')' for grouping if necessary. After the definition 
of a syntax extension for terms or types, the new syntax can be used in the action parts of later grammars. 

As an exercise, one could now try to define the syntax of existential types 'Some (X< : A) B', giving 
the translation into universal types 'All (Y) {A11(X<:A)B->Y}->Y\ For more complex tasks one 
should first read section 3.3 on Actions. (Exercise hints. One has to modify 'typeBase', and capture a 
'typelde' and two 'type's. The symbol for type actions is ' : >', not '=>'. To see what the parser pro- 
duces, write 'do ShowParsing On;'.) 

3. Syntax extension 

In this section we describe a notation for grammars and its use in defining syntax extensions. This no- 
tation is used also in Appendix C to describe the formal syntax of F-sub. 

3.1 Grammars 

Our meta-notation for grammars is slightly non-standard. Moreover, its meaning is tightly associated 
with a particular parser (recursive descent). The reason for these peculiarities is that the same notation is 
used also for the syntax-extension facility within the language. 

Terminal symbols are called tokens; the most important kinds of tokens are identifiers 'obi', delim- 
iters ') ', and quoted strings '"abc"'. The identifiers can be either alphanumeric 'obi' or symbolic '->'. 
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Moreover, identifiers are split into keyword and non-keyword classes; keywords are not legal variable 
names in binding constructs. See Appendix B for the full lexical details. 
A grammar description g is one of the following constructions: 

x An identifier x represents a non-terminal grammar symbol, which must be bound to a 

grammar description. Parsing x is the same as parsing the associated description. 

ide The constant ide denotes a non-keyword identifier token. Parsing this succeeds when 

the next input token is a non-keyword identifier. 

"fun" A non-empty quoted string denotes the keyword or delimiter token given in quotes. 

Parsing this succeeds when the next input token is the given keyword or delimiter. 

string The constant string denotes a quoted-string token. Parsing this succeeds when the 

next input token is a quoted string. 

[gj ... gj Square brackets denote a sequence of grammars. Parsing this succeeds if parsing 
each g t in sequence succeeds. Parsing [] always succeeds. 

(Si ■■■ g n l Curly brackets denote a choice of grammars. Parsing this succeeds if parsing one of 
the g/s succeeds when trying them left to right. Parsing / / always fails. (If one of the 
g-s fails after successfully parsing an input token, then the entire parsing fails, but 
this can happen only if the grammar is not LL(1).) 

(Si * 82) This iteration construct is equivalent to the grammar [g } x] where x ::= f[g 2 x] []}. 

However, the parsing of (g 1 * g 2 ) can build left-associative parse-trees (in conjunc- 
tion with actions), which are not otherwise representable by a non-left-recursive 
grammar. 

A complete grammar has the form: 

x 1 ::=g 1 ... x n ::=g n 

where n>l, the x ( are distinct, and any x occurring in one of the g { is one of the x ( . Moreover, the grammar 
must be non-left-recursive and LL(1) (where 1 refers to one token, not one character). The grammar so de- 
fined is the one defined by Xj. 

As an example, here is a non-ambiguous grammar for untyped A,-terms: 

lambda ::= { ide func appl j 
func ::= [ "[" ide ". " lambda "]" ] 
appl ::= [ "(" lambda lambda ")" ] 

Suppose now we wish to change the syntax of application from '(a b)' to 'a(b)'. The grammar becomes left- 
recursive, but this problem can be eliminated by distinguishing between simple terms and complex terms as 
shown below. The resulting grammar is LL(1), and the recursive-descent parser resolves any ambiguity: 

lambda ::= [ simple arg ] 
arg::={ [ par arg ] [] J 
simple ::= { ide func par] 
func ::= [ "fun" ide ". " lambda ] 
par :: = [ "(" lambda ")" ] 
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The grammar above parses A,-terms, but because of the way left-recursion was eliminated, application asso- 
ciates to the right (that is, a(b)(c) parse as <a<(b)(c)» instead of «a(b)>(c)>), which complicates further 
processing. This problem can be solved by the iteration operator '*', which intentionally associates to the 
left. The grammar of A,-terms should then be expressed as follows: 

lambda ::= ( simple * par ) 
simple ::= { ide func par] 
func ::= [ "fun" ide ". " lambda ] 
par::=[ "("lambda ")" ] 

Implementation-specific warning: when used in the syntax-extension facility, non-LL(l) grammars 
will typically cause parsing failures, and left-recursive grammars will cause non-termination. This is a 
property only of the current implementation; grammars could be analyzed to detect these situations. 

3.2. Syntax 

A syntax extension can be used to define a completely new grammar, or to modify an existing one. 
There are two forms; a syntax term, and a (top-level) syntax phrase. We have seen examples of syntax 
phrases earlier. Here we start with syntax terms, which have the form: 

syntax x 1 : :=gj ... x n : :=g n in ... end 

The allowed forms for the gjs were explained in the previous section. The resulting grammar (x } ) is then 
used to parse the span of the grammar, which is the input stream after 'in'. If this parsing is successful, the 
keyword 'end' is expected, and then the current grammar reverts to the one that was active before entering 
the syntax term. 

The result of parsing a syntax term ' s' is a 'term' according to the basic F-sub syntax of terms (that 
is, where all the syntax extensions have been expanded). The expansion of a syntax term 's' into a 'term' 
is directed by the actions that are defined in 's'; if no action is specified, 's' expands simply to 'top'. For 
example, we define below a grammar with two possible parses, the keywords 'one' and 'two', and no ac- 
tions. (We use outer brackets to avoid confusing a syntax term with a syntax phrase.) 

- {syntax x::={"one" "two"} in one end}; 
e> top ; Top 

A quoted identifier like '"one"' is automatically made into a keyword in the relevant span. Keywords are 
inherited from outer spans to inner spans. (Hence the built-in F-sub keywords may conflict with syntax ex- 
tensions.) 

A top-level syntax phrase is a syntax term where the part 'in ... end' is missing; its span is the re- 
mainder of the top-level session (but see Section 4). 

A syntax phrase does not normally affect the immediate top-level syntax. That is, the non-terminal 
'phrase' given in Appendix C keeps being used for parsing at the '- ' prompt. 

But if the 'top level' keyword is used, then the first non-terminal of the given grammar is adopted 
as the new top-level syntax, and the built-in F-sub syntax is completely bypassed: 

- syntax toplevel x::={"one" "two"}; 

- one 

t> top ; Top 



Page 8 



- two 

t> top : Top 

Note that we are now stuck with 'x' as the top-level syntax; see Section 4 for recovering from this situation. 

Instead of defining a completely new grammar we can extend an existing one. In particular, we can 
extend the existing F-sub grammar described in Appendix C. Useful starting points for extension are 
'termBase' and 'typeBase' (but use 'term' and 'type' on the right-hand side of ': :='). See Ap- 
pendix C for other non-terminals that can be extended; these are marked public. 

To extend a non-terminal 'x' bound to a grammar 'g/ write: 

x : : = ... g 2 

In this case 'x' becomes equivalent to the choice ' { g 1 g 2 } '. In particular, 'x: : = ...{} ' has no effect, 
while 'x: : = ...[] ' makes 'x' optional. For example: 

- {syntax termBase: := ... {"one" "two"} 

in { fun (x : Top) one } (two) end}; 
e> top : Top 

The final topic of this section is how to add infix operators. This is achieved by extending grammars 
that begin with an iteration construct, as opposed to extending arbitrary grammars as shown above. 
To extend the iteration part of a non-terminal 'x' bound to an iteration grammar ' (gj* g 2 ) ' write: 

x : : = ... * g 3 

Then 'x' becomes equivalent to the iteration ' ( g ; * { g 2 g 3 ) ) '. 

In Appendix C we provide a non-terminal 'termOper' as a suitable place for attaching infix opera- 
tors. This 'termOper' is an iteration based on 'termAppl'. The latter is another iteration that parses ap- 
plications, and is in turn based on 'termBase'. Finally, 'termBase' terms are those simple terms that do 
not have pieces of syntax "hanging off to the right". Given this structure, one can attach infix operators to 
'termOper' that will have lower precedence than application. 

The following iteration extension introduces '+' as a left-associative infix operator over 
'termOper's: 

termOper : := ... * [ "+" termAppl ] 

achieving the equivalent of 'termOper ::= ( termAppl * [ " + " termAppl ] ) '. 

The following iteration extension introduces '-' as a right-associative infix operator over 

'termOper's: 

termOper : := ... * [ "-" termOper ] 

achieving the equivalent of 'termOper ::= ( termAppl * [ "-" termOper ] ) '. 

Similarly, 'typeOper' and 'typeBase' can be used for new infix type operators (there is no 
'typeAppl'). The syntax of 'type' in Appendix C implies that these operators will have higher prece- 
dence than '->'. 
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3.3. Actions 



Actions can be attached to grammars. They describe the terms that are to be generated during parsing 
of syntax extensions. By using ' g_n' in a grammar, one specifies that the result of parsing 'g' should be 
stored in the pattern variable '_n'. Pattern variables are then used in actions 'a'; the grammars 'g=>a' and 
'g: >a' specify, respectively, that the parsing of 'g' should generate the term or type described by 'a'. 

We now describe the rules of expansion. The expansion generated by a (successfully parsed) grammar 



is defined as follows: 

Grammar Expansion 

x the expansion generated by the grammar bound to x. 

ide top. 

top. 

string top. 
[g 1 ... g n ] top. 

{ g 1 ... g } the expansion generated by the successful g-. 

( 9] * 92 1 me expansion generated by either g } , if g } alone is successful, or by the 

last g 2 if [g 1 g 2 ■■■ g 2 ] is successful. 
g=>a the expansion generated by the term pattern a (see below). 

g: >a the expansion generated by the type pattern a (see below). 

g_n top, but in addition the expansion generated by g is stored in _n. 

(gj *_n g 2 ) the expansion generated by (g 1 * g 2 ) , but at each iteration 

the latest expansion is also stored in _n. 



A pattern variable '_n' (with n non-negative) is defined when an occurrence of 'g_n' is parsed suc- 
cessfully. The range of definition of a pattern variable '_n' is always confined within a clause 'x: : =g'. In 
addition, a pattern variable defined in a branch of a choice is confined to that branch, and one defined in the 
'g 2 ' of ' (g 1 * g 2 ) ' is confined to 'g 2 '. Errors are given on attempts to define a pattern variable twice, or to 
use one that is not currently defined. 

An action 'a' may contain the pattern variables '_n' that are defined where 'a' appears. Note that an 
action 'a' in 'g=>a' can access pattern variables defined outside 'g' in the surrounding grammar; this abil- 
ity greatly increases the expressive power of actions. An action may also contain ordinary program vari- 
ables bound in the surrounding scope. 

An action appearing after '=>' can be any term pattern. This is, recursively, either a 'term' (including 
any syntax extension of 'term') or one of the following patterns: 

_n 

fun (_n : type-pattern) term-pattern 
fun (_n< : type-pattern) term-pattern 
fun (_n) term-pattern 

The expansion generated by a term pattern is the result of instantiating the term pattern with the expan- 
sions stored in the pattern variables l _n' that occur in it. 

Similarly to term patterns, an action appearing after ' : >' can be any type pattern, which is, recursively, 
either a 'type' or one of the following patterns: 
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_n 

All (_n< : type-pattern) type-pattern 
All (_n) type-pattern 

We are careful to avoid variable capture when patterns are instantiated. Considering 'fun (... :A) b' 
binders (the others are handled similarly), we have the typical situations: 

(1) fun (x: A) x (_1) (2) fun (_1 :A) x (_2 ) 

In 'fun (x : A) ' situations, including example (1), the variable 'x' is consistently renamed so that it does 
not capture other variables named 'x' when the pattern is instantiated. In 'fun (_1 :A) ' situations, variable 
capture on instantiation is normally desired, but only for certain subexpressions. In example (2) we never 
want the variable that replaces '_1' to capture 'x', but we always want the variable that replaces '_1' to 
capture the similarly named variables in the term that replaces '_2 ' . The general situation is handled by two 
separate renaming environments during instantiation; one for resident bound variables ('x', in (1)) and one 
for intruding bound variables (the ones replacing '_1' in (2)). Different subexpressions of the pattern are 
renamed according to the appropriate environment. Variables that are free in an action and bound in the 
top-level environment are allowed, but may produce error messages later when in risk of being captured. 

3.4. How it is done 

The implementation of syntax extensions is really quite simple, when properly organized. Grammars 
are stored in tables associating non-terminal names to grammar descriptions; this association can be 
changed dynamically to extend existing non-terminals. Grammar descriptions include client "action proce- 
dures" to be invoked during parsing to build the abstract syntax trees: no intermediate parse trees are built, 
resulting in very efficient parsing. Intermediate parsing results are kept on a stack, accessed by (the equiva- 
lent of) pattern variables. 

A simple recursive-descent parser interprets these grammar tables blindly, dispatching on the various 
cases of grammar descriptions and calling the action routines when indicated. The action routines attached 
to the built-in syntax of grammars build grammars. The action routines attached to the syntax of actions, 
invoke an external "Act" interface to instantiate patterns. Nothing in this parser and syntax-extension ma- 
chinery is specific to the implementation of F-sub; in fact, it could be and has been reused for other lan- 
guages. 

The built-in F-sub syntax is just a grammar table, so that it can be modified like any other grammar. 
The only parsing code specific to F-sub is provided in the implementation of the interface "Act", used by 
the parser to instantiate the pattern variables within term and type patterns. This module is responsible for 
preventing variable captures, and hence must be aware of the scoping structures of the language at hand. 

The sophisticated hiding and sharing of information needed to separate the parser from the rest of the 
system, is realized via the Modula-3 partially-opaque-types mechanism. 

We now discuss in more detail how actions are instantiated so that variable capture is avoided. The ba- 
sic technique is described in the simplified context of a X-calculus with X-patterns. The technique is then 
instatiated three times in F-sub, for 'All (X< : A) ', 'fun [X< : A) ', and 'fun (x:A) ' binders. 

A pattern p is described by the following data structure: 

p = x I A,x.p I p p' I x I A,x. p 
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where the pattern variables x (corresponding to '_n' in F-sub) are distinct from the ordinary variables x. 

We use renamings p mapping (non-pattern-) variables to (non-pattern-) variables, and instantiations 71, 
mapping pattern-variables to patterns. Here are the corresponding data structures and related operations: 



p = e I x<-y,p' 
jc = e I x<— p,7t' 



x^dom(p') 
x^dom(TC') 



dom(p): domain 
dom(e) = 0 

dom(x<— y,p) = {x}udom(p) 

dom(TC): 

dom(e) = 0 

dom(x<— p,jc) = {x}udom(7t) 

p(x): lookup 
e(x) = x 

(z<-y,p)(x) = p(x) (z^x) 
(x<-y,p)(x) =y 

p\x: restriction 
e\x = e 

(z<— y,p)\x = z<— y,p\x (zt^x) 
(x<-y,p)\x = p 



rng(p): range 
rng(e) = 0 

rng(x<-y,p)= (y}urng(p) 

rng(jc): 

rng(e) = 0 

rng(x<-p,7t) = {p}urng(7t) 

TC(x): 

e(x) = x 

(z<-p,7t)(x) = 7t(x) (ZT^X) 
(x<-p,7C)(x) = p 

tc\x: 

e\x = e 

(z<— P,7t)\x = p,7C\x (ZT^x) 
(xf-p,7C)\x = 7C 



With these operations, we can define the notions of free variables (FV), pattern variables (PV), and 
binding pattern variables (BPV) of a pattern. 

FV(p): PV(p): BPV(p): 

FV(x) = {x} PV(x) = 0 BPV(x) = 0 

FV(Xx.p) = FV(p) - { x } PV(A-x.p) = PV(p) BPV(A,x.p) = BPV(p) 

FV(p p') = FV(p) u FV(p') PV(p p') = PV(p) u PV(p') BPV(p p') = BPV(p) u BPV(p') 

FV(x) = 0 PV(x) = {x} BPV(x) = 0 

FW(kx. p) = FV(p) PV(A,x. p) = {x} u PV(p) BPV(A,x. p) = {x} u BPV(p) 

Free variables and pattern variables are then extended to renamings and instantiations. 

FV(p): PV(p): 

FV(p) = rng(p) PV(p) = 0 

FV(7C): PV(tc): 

FV(tc) = U { FV(p) I p e rng(Tc) } PV(tc) = U { PV(p) I p e rng(TC) } 

Finally, we define the effect of applying renamings and instantiations to patterns. 



Pfp]: 



x[p] = p(x) 

(^x.p)fp] = A,x'.p[x<— x',p\x] 
(p p')[p] = pfp] p'fp] 
xfp] = x 

(A,x.p)[p] =A,x.p[p] 



x'^FV(p), x'£dom(p)urng(p) 
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p[7t]: 



assuming 



xeBPV(p) => 7l(x) is a variable y 



p[7i] = p[e; e; jc] 

x[p;p'; 7C] = x[p] 

(A.x.p)[p; p'; tc] = A,x'.p[x<— x',p\x; p'; jc] 
(p p')[p; p'; Ji] = p[p; p'; tc] p'[p; p'; tc] 
x[p;p'; tc] = Tc(x)[p'] 

(A,x.p)[p; p'; tc] = Xy : . p[p; Tc(x)<-y',p'\rc(x); jc] 



x'£FV(p), x*[p; p'; tc] 



y'*FV(p), y'i[p; p'; TC] 



where x i [p; p'; tc] <^> x i FV(7C)udom(p')urng(p')udom(p)urng(p). 

In p[p; p'; Jt], we use p to rename the bound variables found in p, and we use p' to rename the variables 
found in the range of 7C that are placed in binding positions. 

We now discuss these definitions and the reasons for their side -conditions. 

Although eventually we must obtain a ground pattern (free of pattern variables) for evaluation, we 
cannot require that every pattern instantiation immediately produces a ground pattern. This is because, in 
order to define new syntax extensions in terms of old ones, extended syntax may appear in actions. For ex- 
ample, consider the syntax extensions and actions ' [x " + " y] => plus (x) (y) ' and ' [x "avg" y] 
=> div (y_+x) (two) '. When parsing the latter action we have a non-ground instantiation of 
'div (y+x) (two) ' to 'div (plus (y) (x) ) (two) '. Only later, when 'one avg three' is met, we 
obtain a ground pattern 'div (plus (tree) (one)) (two)'. 

However, we cannot allow arbitrary non-ground instantiations. Of course, we cannot replace a binding 
pattern variable with, for example, a A,-abstraction. But in addition, it seems we cannot replace a binding 
pattern variable with another pattern variable; otherwise we could write: (A.x.A,y.xy)|"x<— z.y<— z.e] = 
(XzAz.zz), which causes a pattern-variable capture. This justifies the restriction (xeBPV(p) => jc(x) is a 
variable y) in the definition of p[;c]. Note that binding pattern variables cannot be a-converted because they 
are "visible from the outside". 

The informal idea that "there are no variable captures" should be formalized by showing that the re- 
naming or instantiation of a-equivalent patterns produces a-equivalent patterns, and by deriving expected 
properties of substitutions. We leave this for future work. 

3.5. Related work on syntax extension 

Griffin [Griffin 1988] has enumerated desirable properties of notational definitions and has studied 
their formalization. Our distinction between normal A-'s and pattern-A-'s seems to remain implicit in his 
work. Unlike Griffin, who translates to combinator forms that then reduce to the desired programs, we 
synthesize those programs directly. (Griffin would handle our 'let x=a in b end' example by trans- 
lating to 'LET (A,x.b) (a) ' for an appropriate combinator 'LET'.) Moreover, while Griffin discusses ab- 
stract translations, we provide a specific grammar definition technique and an efficient parsing algorithm. 
Parsing is efficient because it is LL(1) and because it avoids the creation of intermediate parse trees, pro- 
ducing directly abstract syntax trees that do not require normalization. 

Bove and Arbilla [Bove, Arbilla 1992] discuss how to use explicit substitutions to implement syntax 
extensions. This is an elegant idea that we could perhaps have adopted, but we managed to work with ordi- 
nary substitutions over de Bruijn indices. As in the previous case, their work does not describe a parsing al- 
gorithm, but is theoretically well developed. 
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Some language implementations, like CAML and SML, integrate a YACC or similar parser generator 
that allows them to introduce new syntax [Mauny, Rauglaudre 1992]. If the new syntax is to be mixed with 
the old one, the new syntax must be quoted in some way. Instead, we can freely intermix new and old syn- 
tax without special quotations. 

Hygienic macros [Kohlbecker, et al. 1986] share many of the same goals as our syntax extensions; 
however, these macros account only for macro calls and not for liberally introducing new syntax. Hygienic 
macros employ a multiple-pass time-stamping algorithm to prevent variable capture; this algorithm is, at 
least operationally, different from our single -pass multiple-environment algorithm. We do not handle quota- 
tion and antiquotation in the style of Lisp. 

Finally, our syntax extension mechanism guarantees termination of parsing, even when our "macros" 
are recursively defined. This property does not hold for many macro mechanisms that are computationally 
powerful. 

4. Mock-modules and save-points 

A crude modularization mechanism is provided as an aid to the interactive loading and reloading of 
definitions. Separate compilation is not a goal. 

To facilitate loading and reloading the file, say, 'One . f sub' containing F-sub definitions, one should 
start that file with the following phrase (the module name must be the same as the file name): 

module One; 

If this file relies on definitions contained in files 'Two . f sub' and 'Three . f sub' (which should in turn 
start with the lines 'module Two; ' and 'module Three;' respectively) then 'One . f sub' should 
start with the phrase : 

module One import Two Three; 

Then the variables defined inside Two and Three become available within One. 

A reload phrase can be issued at the top-level to load or to force reloading a module, ('load', which 
was briefly discussed in section 2, will not reload a module that is already loaded.): 

- reload One; 

The meaning of 'reload One; ' is simply to read the Unix file ' . /One . f sub'. A quoted string can 
also be placed after 'reload', in which case the indicated file name is used without modification. 

The intent of reloading a (file containing a) module, is to backtrack to the point in time when that 
module was first loaded. All the intervening top-level definitions (including syntax extensions) are re- 
tracted. When reloading a module, only the imported modules that are not already present are reloaded; in 
particular, a module imported through two different import paths is loaded once. 

The precise behavior of this module mechanism is now described in terms of some lower-level primi- 
tives that handle save-points. In contrast to module phrases, which are mostly useful when used within files, 
save-points may be useful also when interacting at the top-level. For example, they are available even when 
the top-level syntax has been clobbered by the syntax extension mechanism. 

A save-point is a record of the complete state of the system at a given point in time. 

- save that; 
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This phrase creates a save-point called, in this case, 'that', recording the state of the system at the mo- 
ment it is issued. Named save -points are stacked. 
Later, one can issue the phrase: 

- restore that; 

which resets the system back to 'that' save -point, possibly obliterating top-level definitions as well as in- 
tervening save-points with different names. The save-point 'that' is, however, maintained. 
A special save-point exists in the beginning; the phrase: 

- restore; 

restores the system to its initial condition just after start-up. 

- establish that; 

This phrase is equivalent to 'save that; ', if a save-point called 'that' does not exist, and to 
'restore that ;', if a save -point called 'that' does exist. 

- load that; 

This is equivalent to 'reload that ; ' (that is, just reading the file) if a save-point called 'that' does 
not exist, but is a no-op if a save -point called 'that' already exists. 

We can now describe the precise meaning of 'module' phrases. A module of the form: 

module One import Two Three; . . . 

is simply treated as the sequence: 

load Two; load Three; establish One; ... 

where the 'load' phrases may end up establishing the corresponding modules because of module phrases 
in the loaded files. 

5. Top-level phrases 

The top-level phrases fall into several classes. We have described mock-modules, save-points and load- 
ing in Section 4. We now expand on the definition and evaluation phrases sketched in Section 2. Moreover, 
we discuss judgment phrases and command phrases. 

All the phrases that involve types or terms are elaborated as follows. The parsing phase expands the 
syntax extensions. Then, a scoping phase expands type definitions, converts identifiers to de Bruijn indices, 
and detects unbound identifiers. Next, a checking phase verifies the typing correctness of types and terms. 
Then, an evaluation phase normalizes terms. Finally, a printing phase prints the results; identifiers with the 
same name but different de Bruijn indices are decorated in different ways. If an error occurs in one of these 
phases, the file name (if any) and the line position of the error is reported. 

Each phrase is elaborated in the context of the previous top-level phrases. 
• A type definition phrase has the form: 

- Let Xj<:Aj = Bj ... X n <:A n = B n ; 
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where the bounds '< : Af can be omitted, with l Af defaulting to 'Top'. Each l A i+1 ' and 'B,- + / is in the 
scope of 'Xf ... l X{ and of all the previous top-level definitions. Type definitions are fully expanded before 
typechecking. 

• A term definition phrase has the form: 

- let xj-.Aj = bj...x n :A n = b n ; 

where the domains ' : A ; -' can be omitted, with 'A ; -' being inferred from '£>,•'. Each and '■£>,•+/ is in the 

scope of 'xj' ... 'xf and in the scope of all the previous top-level definitions. 

• A type phrase has the form: 

- :A; 

which results in checking the type 'A' with respect to the current top-level definitions. 

• A term phrase has the form: 

- a; 

which results in checking and evaluating the term 'a' with respect to the current top-level definitions. 

• An environment E (often called also a context or an assignment) is a possibly empty sequence of ei- 
ther type variables with a bound ( l X< : A') or term variables with a domain ('x : A'). Each variable is in the 
scope of the environment to its left and in the scope of the top-level definitions. 

• A judgment is one of the four formal statements axiomatized in Appendix D, each involving an envi- 
ronment. Each of the four statements has a corresponding phrase, as follows. 

An environment judgment phrase has the form: 

- judge env E; 

where the environment is in the scope of the previous top-level definitions (and similarly for the follow- 
ing judgments). 

A type judgment phrase has the form: 

- judge type E \ - A; 

A subtype judgment phrase has the form: 

- judge subtype E |- A<:B; 
Finally, a term judgment phrase has the form: 

- judge term E |- a: A; 

If the correctness of one of these judgments is established, a simple 'ok' is printed. It is informative to 
turn on tracing (as described below) when elaborating judgments. 

• A command phrase is used to switch on and off various options. It has the form: 

- do command argument ; 

One can get a list of all the available commands by writing: 

- do; 

and one can find out about an individual command by writing: 

- do command; 
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The command 'do Vers ion ; ' prints the current version of the system. 

After issuing the command 'do ShowParsing On; ' the result of parsing each phrase is printed. 
This is useful for debugging syntax extensions. 

After issuing the command 'do ShowVarlndex On; ' the de Bruijn indices of variables are printed 
along with the variables. 

The command 'do Quantif ierSubtyping X; ' switches between the undecidable F<; rule for 
quantifier subtyping (X = LeastBound), the decidable Fun rule [Cardelli, Wegner 1985] (X = 
EqualBounds) and a decidable rule proposed by Giuseppe Castagna (X= TopBound). 

After issuing the command 'do TraceType On; ', each call to the type routine of Appendix E is 
traced. Similarly, 'do TraceSubtype On; 'and 'do TraceTerm On; ' correspond to the sub and 
term routines. 

Some other commands are used for system debugging and are not documented here. 

6. Type inference by "argument synthesis" 

In pure F-sub one has to write down an often overwhelming amount of type information. This is al- 
ready evident when encoding something as simple as pairing constructs. For example, through syntax ex- 
tensions we can define a cartesian product operator 'A*B' as 'All (C) { A->B->C } ->C' (see Appendix 
A), along with the operations: 

pair: All (A) All (B) A->B->A*B 
fst: All (A) All (B) A*B->A 
snd: All (A) All (B) A*B->B 

To create and manipulate simple pairs we have to write, for example; 

- let a = pair (Bool) (Top) (true) (top); (* the pair (true, top) *) 

- fst (Bool) (Top) (a); (* the first component of pair a *) 

A triple is already quite a challenge: 

- pair (Bool) (Top*Bool) (true) ( 

pair (Top) (Bool) (top) (false)); (* the triple (true, top, false) * ) 

What is worse, we cannot even define a syntax extension of the form, for example, ' a , b' for pairs, because 
the type arguments must be provided somehow. 

Fortunately, a form of type inference is available. To enable it, we append question marks ' ? ' to the 
type parameters that we would like to omit (loosely following [Pollack 1990]). For example, the polymor- 
phic identity could be written: 

- Let Id = All (X?) X->X; 

- let id: Id = f un (X? ) f un (x : X) x; 

Then, the type arguments corresponding to question-mark parameters must be omitted: 

- id(top); (* instead of id(:Top) (top) *) 
t> top : Top 
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In this situation, we say that the type parameter 'X' of 'Id' is stripped, to compensate for the missing type 
argument, and that the argument is later synthesized. 

A type quantifier is stripped by introducing a fresh unification variable that may be instatiated later, or 
never; a unification algorithm is responsible for the synthesis of the argument. Type parameters are stripped 
if and only if they appear at the beginning of the type of a term identifier (that is, not an arbitrary term): we 
found this restriction useful both for the inference algorithm and in understanding how inference behaves in 
actual programs. Here is a situation where stripping occurs, and a unification variable is exposed in the 
printed result: 

- id; 

t> {fun(x:X?)x} : {X?->X?} 
If needed, we can prevent stripping by placing an exclamation mark after a term identifier: 

- id!; 

t> <id> : <Id> 

This option is useful, for example, if we want to pass the (unstripped) polymorphic identity as an argument 
to another term. 

Going back to pairs, we can now rewrite our primitives so that they admit type inference: 

pair: All (A? ) All (B? ) A->B->A*B 
fst: All (A?) All (B?) A*B->A 
snd: All (A?) All (B?) A*B->B 

This allows us to write triples a bit more compactly, by omitting the type arguments: 

- pair (true) (pair (top) (false)); (* the triple (true, top, false) *) 

But, what is more important, we can now put syntax extensions to work and define a simple 'a, £>' nota- 
tion: 

- syntax 

termOper : := ... *_1 

[ "," term0per_2 ] => pair (_1 ) (_2 ) ; 

- true, top, false; (* the triple (true, top, false) *) 

We are finally able to write pairs in a convenient notation, by the interplay of type inference and syntax ex- 
tensions. 

We conclude this section with some general remarks about this form of type inference; details of the 
algorithm are in Appendix F. 

The types 'All (X?< : A) B' and 'All ( X< : A ) B' are incomparable. A type 'All (X?< : Top) B' is 
stripped to 'B' where 'X' is treated as a fresh unification variable. Instead, a type 'All (X?< : A) B' with a 
non-'Top' bound is stripped simply to 'B{X<— A}'. 

When an occurrence of 'X' bound by an 'X?' appears nested within other quantifiers, it must not be in- 
stantiated in a way that will cause variable captures. To this end, we used first-order unification under a 
mixed prefix [Miller (to appear)]. For a practical example of where this matters, see the Existentials section 
in Appendix A. As an ad-hoc example, consider the following term where a type parameter is omitted in 
the application of f : 
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- fun (f :A11 (Y?) {All (W) W->Y}->Y) f ( f un ( Z ) f un ( z : Z ) z ) ; 

t> Type error. Type inference rank check: instantiation type for 
Y? contains a (different) variable Z that is bound deeper 
than the Y? binder: 

Z (=W) (last input line, char 46) 
Error detected (last input line, char 51) 

If ordinary unification is used instead of mixed-prefix unification, we match All (W) W->Y? against 
All ( Z ) Z->Z, causing the unification of Y? with Z. Hence the whole term above acquires the type: 

{All (Y?) {All (W) W->Y}->Y} -> Z 

where the final Z (which is unified with Y) has escaped its scope and remains unbound. We can then 
provide the following argument for the term above, obtaining a term that has the escaped Z as its type: 

fun (Y?) fun (g:All (W) W->Y) g (Top) (top) 

After typechecking an entire top-level phrase, some of the unification variables used for type inference 
may remain undetermined. We choose to tolerate this situation in term-phrases ('a; '), but we report an er- 
ror in term-definition-phrases ('let...;'). 

We believe that our type inference algorithm is essentially the same as the one used in LEGO [Pollack 
1990] and a first-order version of the one used in ELF [Pfenning 1989] (although we have no detailed 
knowledge of those implementations). We believe the algorithm is sound, but is not complete, particularly 
because we are using unification in a subtyping context. As a heuristic, this inference algorithm works ex- 
ceedingly well. 

7. Recursion 

In this section we describe an extension of F-sub with recursive types and recursive values. The inte- 
gration of recursion with subtyping in a first-order system is studied in [Amadio, Cardelli 1991]. The ideas 
described there should work in a second-order system such as F<: . However, here we take a simpler ap- 
proach to recursive types, to minimize their interference with second-order types and type inference tech- 
niques. 

The main idea is that the isomorphism between a recursive type 'Rec (X) B' and its unfolding 
'B{X<^Rec (X) B}' is made explicit in the syntax of terms. In first approximation, we have: 

unfold : Rec (X) B -> B{X<—Rec (X) B] 
fold : B{ X<—Rec (X) B] -> Rec (X) B 

More precisely, we extend the syntax of F-sub as follows: 

A, B : := ... types as before, plus: 



Rec (X) B 



recursive types 



a, b 



fold( :A) (b) 
unfold (b) 
rec (x:A)b 



terms as before, plus: 

fold b into an element of the recursive type A 
unfold an element b of a recursive type 



recursive terms 
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Since the isomorphism is explicit, we do not have 'Rec (X) B' = 'B{X<—Rec (X) B}\ Instead, two re- 
cursive types are equal only if their respective 'Rec' binders are found in corresponding positions. Given 
this restriction, the recursive subtyping algorithm becomes much simpler (while remaining non-trivial). The 
central type rule for recursive subtyping is unchanged, but the auxiliary judgment and rules having to do 
with type equality [Amadio, Cardelli 1991] are dropped. The type rules and algorithms are described in Ap- 
pendix G. 

As a simple use of recursive types, let us define the type of untyped lambda-terms, and some standard 
combinators. 

Let V = Rec (V) V->V; 

let lam: {V->V}->V = fun(f:V->V) fold(:V)(f) 

app: V->{V->V} = fun(f:V) fun(a:V) unf old ( f ) (a) ; 

let i: V = lam (fun (x:V) x) 

k: V = lam(fun(x:V) lam (fun (y : V) x) ) 
s: V = lam (fun (x: V) lam(fun(y:V) lam(fun(z:V) 
app (app (x) (z) ) (app (y) (z) ) ) ) ) ; 

let y: V = rec(y:V) lam(fun(f:V) app(f) (app(y) (f))); 

With a bit of syntax extension one can eliminate the 'lam' and 'app' clutter (see Appendix A). 
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Appendices 



Appendix A. Examples 

Identity 

This is the file 'Id . f sub'. It defines the polymorphic identity in such a way that its type parameter 
can be omitted. 

module Id; 

Let Id = A11(X?) X->X; 

let id: Id = fun(X?) fun(x:X) x; 

Unit 

This is the file 'Unit . f sub'. 'Unit' is the encoding of a data type with a single element 'unit'. It 
is essentially the same as the polymorphic identity, but because of the intended use of 'unit', type infer- 
ence is not desirable. 

module Unit; 
( * Defines : 

Unit = All (X) X->X 

unit : Unit 

*) 

Let Unit = All (X) X->X; 

let unit: Unit = fun (X) fun(x:X) x; 

Booleans 

This is the file 'Bool . f sub'. This is the encoding of a data type with two elements, 'true' and 
'false'. Also provided are two subtypes of 'Bool' containing one element each. Standard boolean opera- 
tors are defined. The syntax of terms is extended with two keywords 'true' and 'false', with condi- 
tionals, and with two infix operators. 

module Bool; 
(* Defines: 

Bool = All (X) X->X->X 

True, False <: Bool 

true, false: Bool 

tt: True 

ff: False 

not : Bool->Bool 

and, or, _/\_, _\/_: Bool->Bool->Bool 
if _ then _ else _ end 

*) 

Let Bool = All (X) X->X->X 

True = All (X) X->Top->X 
False = All (X) Top->X->X; 

let true: Bool = fun (X) fun(x:X) fun(y:X) x 

false: Bool = fun (X) fun(x:X) fun(y:X) y; 

let tt : True = fun (X) fun(x:X) fun(y:Top) x 

ff: False = fun (X) fun(x:Top) fun(y:X) y; 

let cond = fun(X?) fun(b:Bool) b(:X); 

(* Bool, true, and false are turned into keywords *) 
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syntax 

typeBase : : = ... 
"Bool" :> Bool 

termBase : : = ... 
{ "true" => true 
"false" => false 

["if" term_l "then" term_2 "else" term_3 "end" ] 
= > cond(_l) (_2) (_3) } 

let not: Bool->Bool = 

fun(x:Bool) if x then false else true end 
and: Bool->Bool->Bool = 
fun(x:Bool) fun(y:Bool) 

if x then y else false end 
or: Bool->Bool->Bool = 
fun(x:Bool) fun(y:Bool) 

if x then true else y end; 

syntax 

termOper : := ... *_1 

{ [ "A\" termAppl_2 ] => and(_l) (_2) 
[ "\\/" termAppl_2 ] => or(_l)(_2) } 



Products 

This is the file 'Product . f sub'. It defines a cartesian product operator, extending the syntax of 
types, and a pairing operator, extending the syntax of terms. Syntax extensions and type inference interact 
in this situation, so that pairs can be constructed simply by infixing a ' , ' . 

module Product; 
( * Defines : 

A*B = All (C) {A->B->C}->C 

All (A?) A11(B?) A->B->A*B 

pair: All (A?) A11(B?) A->B->A*B 

fst: All (A?) A11(B?) A*B->A 

snd: All (A?) A11(B?) A*B->B 

*) 

syntax 

typeOper : := ... *_1 
["*" type0per_2] 

:> All(C) (_l->_2->C}->C 

let pair: All (A?) A11(B?) A->B->A*B = 
fun (A?) fun(B?) fun(a:A) fun(b:B) 
fun(C) fun (p:A->B->C) p (a) (b) ; 

let fst: All (A?) A11(B?) A*B->A = 

fun (A?) fun(B?) fun(p:A*B) p ( : A) ( f un (a : A) f un (b : B) a) ; 
let snd: All (A?) A11(B?) A*B->A = 

fun (A?) fun(B?) fun(p:A*B) p ( : B) ( f un (a : A) f un (b : B) b) ; 

syntax 

termOper : := ... *_1 

["," term0per_2] => pair(_l) (_2) 

f 

Sums 

This is the file 'Sum. f sub'. It defines a disjoint union operator, extending the syntax of types, and a 
'case' construct extending the syntax of terms. Note that 'case' introduces local bindings. 

module Sum; 
(* Defines 

A+B = All (C) {A->C}->{B->C}->C 

inl: All (A?) A11(B?) A->A+B 
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inr: All (A?) A11(B?) B->A+B 

sum: All (A?) A11(B?) A11(C?) A+B-> { A->C } -> { B->C } ->C 
case term 

inl (ide : type) term 

inr (ide : type) term 
end, 

*) 

syntax 

typeOper : := ... *_1 
[ "+" type0per_2 ] 

:>A11(C) {_1->C } -> {_2->C } ->C 

r 

let inl: All (A?) A11(B?) A->A+B = 
fun (A?) fun(B?) fun(a:A) 

fun(C) fun(f:A->C) fun(g:B->C) f (a) ; 
let inr: All (A?) A11(B?) B->A+B = 
fun (A?) fun(B?) fun(b:B) 

fun(C) fun(f:A->C) fun(g:B->C) g(b); 

let sum: All (A?) A11(B?) A11(C?) A+B-> { A->C } -> { B->C } ->C = 
fun (A?) fun(B?) fun(C?) 

fun(s:A+B) fun(f:A->C) fun(g:B->C) 
s(:C) (f) (g) ; 

syntax 

termBase : : = ... 

["case" term_l 
"lft" "(" termlde_2 ":" type_3 ")" term_4 
"rht" "(" termlde_5 ":" type_6 ")" term_7 
"end"] 

=> sum(_l) (fun (_2 :_3)_4) ( f un (_5 : _6 ) _7 ) 

f 

Tuples 

This is the file 'Tuple . f sub'. It defines type tuples as iterated cartesian products ending with 'Top', 
so that a longer tuple type is a subtype of a shorter tuple type. Note that the previously defined syntax for 
cartesian products is used here to provide a further syntax extension. Tuple values are iterated pairings end- 
ing with 'top'. 

module Tuple 
import Product; 
(* Defines: 

Tuple (type . . . type) 

tuple (term . . . term) 

*) 

syntax 

typeBase : : = ... 

[ "Tuple" "(" typeTuple_l ")" ] :> _1 
typeTuple : : = 

{ [ type_l typeTuple_2 ] : > _1 * _2 
[] :> Top } 

syntax 

termBase : : = ... 

[ "tuple" " (" termTuple_l ") " ] => _1 
termTuple : : = 

{ [ term_l termTuple_2 ] => _1 , _2 
[] => top } 
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Inductive Lists 

This is the file 'indList . f sub'. It defines 'List (A) ' data types encoded by inductive definitions 
(that is, without using recursion over types). Syntax extensions are used here to simulate a third-order oper- 
ator ('List') within a second-order language: 'List (A) ' is a second-order type only for a fixed 'A'. Note 
that the action for 'List' uses a local variable 'L' that must be kept distinct from any variable that may ap- 
pear in a parameter to 'List'; this is taken care of by the action instantiation algorithm. The syntax of 
terms is extended with a case construct and a convenient way of building lists of many elements; again, 
syntax extensions and type inference interact in interesting ways. 

module IndList 
import Bool; 
(* Defines: 

List (A) = All(L) L->{A->L->L}->L, 

nil : All (A?) List (A) , 

cons: All (A?) A->List (A) ->List (A) , 

null: All (A?) List (A) ->Bool , 

hd: All (A?) List (A) ->A->A, 

tl: All (A?) List (A) ->List (A) 

caseList term 
nil ( ) term 

cons (ide : type ide:type) term 
end, 

list (term . . . term) 

*) 

syntax 

typeBase : : = ... 

[ "List" " (" type_l ") " ] 

:> All(L) L->{_1->L->L}->L 

r 

let nil: All (A? ) List (A) = 

fun (A?) fun (L) fun (n:L) fun (c:A->L->L) n; 

let cons: All (A? ) A->List (A) ->List (A) = 
fun (A?) fun (hd: A) fun (tl : List (A) ) 
fun (L) fun (n : L) fun (c : A->L->L) 
c(hd) (tl(:L) (n) (c) ) ; 

let iterList: All (A? ) All (B? ) List (A) ->B-> { A->B->B } ->B = 
fun (A?) fun (B?) fun (1 : List (A) ) 
fun (n : B) fun (c : A->B->B) 
1 ( :B) (n) (c) ; 

syntax 

termBase : : = ... 
{ "nil" => nil 
"cons" => cons 
["caseList" term_l 
"nil" " (" ") " term_2 

"cons" "(" termlde_3 ":" type_4 termlde_5 ":" type_6 ")" term_7 
"end"] 

= > iterList (_1) (_2) (fun(_3:_4) fun(_5:_6)_7) } 

let null: All (A? ) List (A) ->Bool = 
fun (A?) fun (1 :List (A) ) 
caseList 1 
nil ( ) true 

cons(hd:A tl:Bool) false 
end; 

let hd: All (A? ) List (A) ->A->A = 
fun (A?) fun (1 : List (A) ) fun (a : A) 
caseList 1 
nil() a 

cons(hd:A tl:A) hd 
end; 
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let tl: All (A?) List (A) ->List (A) = 
fun (A?) fun (1 : List (A) ) 
caseList 1 
nil() nil 

cons(hd:A tl:List(A)) tl 
end; 

syntax 

termBase : : = ... 

[ "list" "(" termList_l ")" ] => _1 
termList : : = 

{ [ term_l termList_2 ] => cons (_1) (_2) 
[] => nil } 



Recursive Lists 

This is the file 'RecList . f sub'. It provides the same constructions as 'indList . f sub', except 
that lists are encoded via recursive types. Note how the operators provided here encapsulate the folding and 
unfolding of recursion, so that they need not be used directly. 

module RecList 
import Bool; 
( * Defines : 

List (A) = Rec(L) All (C) C-> { A->L->C } ->C, 

nil : All (A?) List (A) , 

cons: All (A?) A->List (A) ->List (A) , 

null: All (A?) List (A) ->Bool , 

hd: All (A?) List (A) ->A->A, 

tl: All (A?) List (A) ->List (A) , 

caseList term 
nil ( ) term 

cons (ide : type ide:type) term 
end, 

list (term . . . term) 

*) 

syntax 

typeBase : : = ... 

[ "List" " (" type_l ") " ] 

:> Rec(L) All (C) C-> {_1->L->C } ->C 

let nil: All (A? ) List (A) = 
fun (A?) 

fold ( :List (A) ) (fun (C) fun (n:C) f un ( c : A->List (A) ->C) n) ; 

let cons: All (A? ) A->List (A) ->List (A) = 
fun (A?) fun (hd: A) fun (tl : List (A) ) 

fold (: List (A) ) ( f un (C) f un (n : C) f un ( c : A->List (A) ->C) c (hd) (tl)); 

let recList: All (A?) A11(B?) B-> { A->List (A) ->B ) ->List (A) ->B = 
fun (A?) fun(B?) fun(n:B) fun (c : A->List (A) ->B) 
fun (1 :List (A) ) unfold(l) (:B) (n) (c); 

syntax 

termBase : : = ... 
{ "nil" => nil 
"cons" => cons 
["caseList" term_l 
"nil" " (" ") " term_2 

"cons" "(" termlde_3" : "type_4 termlde_5" : "type_6 ")" term_7 
"end"] 

=> recList (_2) ( f un (_3 : _4 ) f un (_5 : _6 ) _7 ) (_1) 

1; 
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let null: All (A? ) List (A) ->Bool = 
fun (A?) fun (1: List (A) ) 
caseList 1 
nil ( ) true 

cons(hd:A tl:List(A)) false 
end; 

let hd: All (A? ) List (A) ->A->A = 
fun (A?) fun (1 : List (A) ) fun (a : A) 
caseList 1 
nil() a 

cons(hd:A tl:List(A)) hd 
end; 

let tl: All (A?) List (A) ->List (A) = 
fun (A?) fun (1 :List (A) ) 
caseList 1 
nil() nil 

cons (hd:A tl:List (A) ) tl 
end; 

syntax 

termBase : : = ... 

[ "list" "(" termList_l ")" ] => _1 
termList : : = 

{ [ term_l termList_2 ] => cons (_1) (_2) 
[] => nil } 



Existentials 

This is the file 'Some . f sub'. Bounded and unbounded existential quantifiers are encoded in terms of 
universal quantifiers. Syntax is provided which is analogous to the built-in syntax for universal quantifica- 
tion. 

module Some; 
(* Defines 

Some ( ide ) type , Some ( ide< : type ) type 

pack ide< : type=type as type with term end 

open term as ide<:type ide: type in term end 

*) 

(* easy version: 
syntax 

typeBase : : = ... 

[ "Some" "(" typelde_l "<:" type_2 ")" type_3 ] 
:> A11(V?) (All(_l<:_2) _3 -> V} -> V 

; *) 

(* some interesting pattern-variable manipulation: *) 
syntax 

typeBase : : = ... 

[ "Some" "(" typelde_l 

{ ["<:" type_2 ")" type_3 ] 

:> A11(V?) {All(_l<:_2) _3 -> V} -> V 

[") " type_3 ] 
:> A11(V?) (All(_l) _3 -> V} -> V 
} _4 
] :> _4 

r 

syntax 

termBase : : = ... 

{ [ "pack" typelde_l "<:" type_2 "=" type_3 "as" type_4 
"with" term_5 "end" ] 

=> fun(V?) fun (f :A11 (_1< :_2)_4->V) f(:_3)(_5) 
[ "open" term_l "as" typelde_2 "<:" type_3 termlde_4 ":" type_5 
"in" term_6 "end" ] 

=> _1 (fun(_2<:_3) fun(„4:_5)_6) } 
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(* Example: 



load Bool; load Product; 

Let Spec = Some (X< : Bool ) X*{X->Bool}; 

let impl : Spec = 

pack X<:Bool=True as X*{X->Bool} 
with tt, fun (x : True) true end; 

open impl as X<:Bool p : X* { X->Bool } 
in snd(p) (fst(p)) end; 

Note: trying to extract fst (p) rightfully causes a type-inference 
rank-check, which would not be captured by the normal first-order 
unification algorithm. 
*) 

Untyped 1-terms 

This is the file 'Scott . f sub'. It uses recursive types to encode the untyped A,-calculus. 

module Scott; 
(* Defines 
V = V->V 

\x e 
e . e 

i, k, s,y : V 

*) 

Let V = Rec (V) V->V; 

syntax 

termBase : : = ... 

["\\" termlde_l term_2] 
=> fold ( :V) (fun (_1 :V)_2) 
termOper ::= ... *_1 
["." termAppl_2] 

=> unfold (_1) (_2) ; 

let i = \x x 

k = \x \y x 

s = \x \y \z ( x . z } . { y . z } 

y = \f {\x f . (x.x) } . (\x f . (x.x) }; 

(* Note: the evaluator is eager; k.i.fy.i} will diverge. To fix this, us 

module Scott 
import Unit; 

Let V = Rec(V) { Unit->V } ->V; 

syntax 

termBase : : = ... 
{ ["@" termlde_l] 
=> _1 (unit) 
["\\" termlde_l term_2] 

=> fold ( :V) (fun (_1 :Unit->V)_2) ) 
termOper ::= ... *_1 
["." termAppl„2] 

=> unfold (_1) (fun (u: Unit) _2) ; 

let i = \x @x; 
let k = \x \y @x; 

let s = \x \y \z {@x.@z} . {@y.@z}; 

let y = \f {\x @f . {@x.@x} } . {\x @f . { @x . @x } } ; 



untyped lambda 
untyped application 
the usual combinators 



Appendix B. Lexicon 

The ASCII characters are divided into the following classes: 

Blank HT LF FF CR SP 

Reserved " ' ~ 

Delimiter (),.;[]_{}?! 

Special #$%&* + -/ :< = >@\" 

Digit 0123456789 

Letter ABCDEFGHIJKLMNOPQRSTUVWXYZ 

abcdefghijklmnopqrstuvwxyz 

Illegal all the others 

Moreover: 

- a StringChar is either 

- any single character that is not an Illegal character or one of ' ' ', ' \\ 

- one of the pairs of characters '\ ' ', '\"\ '\\\ 

- a Comment is, recursively, a sequence of non-Illegal characters and comments 

enclosed between '(*' and '* ) '. 

From these, the following lexemes are formed: 

Space a sequence of Blanks and Comments. 

AlphaNum a sequence of Letters and Digits starting with a Letter. 

Symbol a sequence of Specials. 

Char a single StringChar enclosed between two ' ' '. 

String a sequence of StringChars enclosed between two ' " ' . 

Int a sequence of Digits, possibly preceded by a single minus sign 

Delimiter a single Delimiter character. 

A stream of characters is split into lexemes by always extracting the longest prefix that is a lexeme. 
Note that Delimiters do not stick to each other or to other tokens even when they are not separated by 
Space, but some care must be taken so that Symbols are not inadvertently merged. 

A token is either a Char, String, Int, Delimiter, Identifier, or Keyword. Once a stream of characters has 
been split into lexemes, tokens are extracted as follows. 

- Space lexemes do not produce tokens. 

- Char, String, Int, and Delimiter lexemes are also tokens. 

- AlphaNum and Symbol lexemes are Identifier tokens, except when they have been 

explicitly declared to be keywords, in which case they are Keyword tokens. 
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Appendix C. Syntax 

The predefined keywords are: 

For grammar definitions: 

char end ide in int string syntax ::= => :> * = 
For F-sub proper: 

All Let Rec Top fold fun judge let rec top unfold : <: -> = |- 

♦ The grammar of phrases is as follows. 

phrase ::= (* public *) 

[ "Let" typeBinding " ; " ] 
["let" termBinding ";"] 
[":" type ";"] 
[ term " ; " ] 
[synDecl ";"] 
["judge" { ["env" env] 

[ "type" env " | -" type ] 

[ "subtype" env " | -" type "< : " type ] 

[ "term" env " | -" term " : " type ] 

}";"] 

["reload" { ide string } ";"] 

[ "restore" { ide [] } ";" ] [ { "save" "establish" "load" } ide ";" ] 
[ "module" ide { "import" ideList } "; " ] 
[ "do" { [ ide { ide [] }][]}";"] 

} 

ideList ::= 

{ [ ide ideList ] [] } 

typeBinding ::= 

{ [ ide { [ "< : " type ] [] } "=" type typeBinding ] [] } 

termBinding ::= 

{ [ ide { [ " : " type ] [] } "=" term termBinding ] [] } 

env ::= 

{[ide {["<:" type] [":" type] }][]} 

♦ The grammar of types and terms is as follows. 

pvar ::= 

[ "_" int ] 

binder ::= 

{ ide pvar } 
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type ::= (* public *) 

[typeOperf ["->"type [] ] } ] 

typeOper ::= (* public, hook for client infixes *) 

( typeBase *_1 { } ) 

typelde ::= (* public *) 

ide 

typeBase ::= (* public *) 

{ typelde pvar "Top" 
[ "All" " (" binder { "?" []}{["<: " type] [] } ") " type ] 
["Rec" " (" ide ")" type] 
["{" type "}"]} 

term ::= (* public *) 

termOper 

termOper ::= (* public, hook for client infixes *) 

(termAppl *_1 { }) 

termAppl ::= (* public *) 

( termBase *_1 
{ ["(" { [":" type] term } ")"] 
"!" } ) (* "!" must follow an identifier or keyword *) 

termlde ::= (* public *) 

ide 

termBase ::= (* public *) 

{ termlde pvar "top" 
[ "fun" " (" binder { [ " : " type ] [ "< : " type ]["?"{[ "< : " type ][]}][]}")" term ] 
["fold" "(" ":"type ")" " (" term ")"] 
["unfold" " (" term")"] 
["rec" " (" ide ":" type ")"term] 
[ " { " term " } " ] 
synTerm } 

The grammar of syntax extensions is as follows. Note that the grammar for synTerm cannot be 
written down precisely in this notation. 

synDecl ::= 

[ "syntax" { "toplevel" [] } grammar] 

synTerm ::= 

[ "syntax" grammar "in" ... "end" ] 

grammar ::= 
clauseSeq 

clauseSeq ::= 

[ ide " : : =" extends gramExp { clauseSeq [] } ] 

e30 



extends ::= 

{["." "." "."{["*" {pvar []}[]}[] } 

gramExp ::= 

[ gramExpBase 
{ [ "=>" term ] 
[ " : >" type ] 
pvar 

[] }] 

gramExpBase ::= 

{ ide string "ide" "int" "char" "string" 
["["gramExpList "]"] 
[ " { " gramExpList " } " ] 

[ " (" gramExp { [ "*" { pvar [] } gramExp ][]}")"]} 

gramExpList ::= 

{ [ gramExp gramExpList ] [] } 
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Appendix D. Typing rules 

These are the typing rules of F-sub, as described in [Cardelli, et al. 1991]. 



Environments 

(Env 0) 



E h 0 env 



(Env x) 
E \- A type 



x i dom(£) 



h E, x : A env 



(Env X) 
E \- A type 



x£dom(£) 



V- E, X < : A env 



Types 



(Type X) 
\-E,X <:A,E' env 

E,X <:A,E' hXtype 



(Type Top) 
h£env 

E h Top type 



f?ype - >) 

£ I- A type 



E h B type 



£ h A— > B type 



(Ty/?e Allj 

£,X<:AhB type 

£ I- All (X < : AJBtype 



Subtypes 



(Sub refl) 
E\- A type 

E \-A <:A 

(Sub->) 
E\-A'<:A 



(Sub trans) 
E\- A<:B 



E\- B<:C 



E\-A<:C 



E \- B <: B ' 



E h A- > B <: A'->B' 



(Sub All) 
E\-A'<: 



(Sub X) 
h E,X <:A,E' env 



(Sub Top) 
E\- A type 



E,X <:A,E' \- X <:A E h A < : Top 



A £ , ,X<:A'|-B<:B , 



£ h All (X <: A)B <: A11(X<:A')B' 



Terms 



(Sub sumption) 

£ha:A £ h A < : B 

£ h a : B 

(Term fun) 

E, x : A h b : B 

£hfun(x:A)b : A- > B 



(Term fun2) 



E,X < : A h b : B 



(Term x) 
h £, x : A, £' env 

E,x:A,E' \- x:A 

(Term appl) 
E\- b:A->B 



E h b(a) 



(Term top) 
h £env 

£ h top : Top 



£ I- a: A 



B 



(Term appl2) 
£hb:All (X < : A)B 



£hA'<:A 



£ h fun (X <: A)b : All (X < : A)B 



£hb(:A') : B{X<-A'} 
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In preparation for the typing algorithms, these are the same type rules expressed with de Bruijn indices. 
The notation eA stands for either : A or < : A. Lifting is At . , and substitution is B{i <r- A} ; the latter is to be 
invoked as B{ 1 A} . 

At . = At] ; rit/ = fi (fi < j) ; fit/ = n + i (fi > j) ; Topt{ = Top 
(A- > B)t J ; = AV- > BV; [ ; (All (< : A)B)t{ = All (< : Atyfitf 1 

n{i<r-C} = n (n<i); n{n <- C} = ct „., ; n{/ <- C} = fi - 1 (n>i); Top{; <- C} = Top 

(A- > B){i <r- C} = A{i <r- C}- > B{i + 1 <r- C} ; (All (<:A)B){/<-C} = All ( < : A{i <- C})B{i + 1 <- C] 



Environments 

(Env 0) 



E h 0 env 



(Env x) (Env X) 

E\-A type £ h A type 

h £,: A env h £, < : A env 



(Type X) (Type Top) (Type - >) 

\- E,<: A, eA n _j, eAj env h£env £:AhBtype 



(7>peAllJ 

£, < : A h B type 



E,<: A, eA nl ,..., eAj h n type £hToptype £hA->Btype £ h All (<: A)B type 



Subtypes 



(Sub refl) 
E\-A type 

£ hA <:A 

(Sub->) 
E\-A'<:A 



(Sub trans) 
E h A <:B 



£hB<:C 



£ hA <:C 



E,: A' h B < : B 



h£<:A, 6A n [,..., 6Aj env 



f^Mfo Top) 

£ h A type 

£,<: A,6A B . 1 ,...,6A 1 hn<:At„ £ h A < : Top 



(Si^ All) 
£ h A '< : A 



£, < : A ' hB<:B 



£hA->B <: A'->B' 



£hAll(<:A)B <: A11(<:A')B' 



Terms 



( Subsumption ) 

£ha:A £ h A < : B 



(Term x) 
hE.iA.eA^, 



.,6A, env 



£ha:B £,:A,eA„ 



,eA, h/i:At, 



(Term fun) (Term appl) 

E,:A\-b:B E\~b:A->B 

E h f un ( : A)b : A- > B 

(Termfunl) 

E, < : A h b : B 

E h f un ( < : A)b : All (<:A)B 



£h a:A 



E\-b(a) : B 

(Term appl2) 
EY-b:All (<: A)B 



(Term top) 
h £ env 

£ h top : Top 



£hA'<:A 



Er-b(-.A') : B{1^A'} 
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Appendix E. Typing algorithm 

The parsing phase eliminates all the syntax extensions. A scoping phase converts variables to de Bruijn 
indices, and checks that all variables are properly bound. (Variables in type contexts should be bound by 
type binders, while variables in term contexts should be bound by term binders.) The scoping phase also 
expands all the top-level type definitions. Therefore, only the following data structures have to be consid- 
ered for typechecking (where ji>1): 

PreType 

S, T= n | Top | S->T | All (<:S)T 
PreTerm 

a,b = n | top | fun (:S)i> I b(a) | fun(<:S)£> | b(:S) 

Env 

E = 0 | E, <:A | E, :A 
Type 

A, B = n | Top | A->B | A11(<:A)B 



The following algorithms are expressed in the form of deterministic labeled transition systems [Plotkin 
1981]. (They can be read much as Prolog programs.) Each kind of "arrow" defines a (functional) relation. 
The name of the relation is on top of the arrow, the main parameters are on the left, additional parameters 
are below, and results are on the right. The main parameters are, by convention, the ones subject to struc- 
tural induction. The signature of each relation is given in a box, and includes parameter names as com- 
ments; the notation a:A(=b) means that b is the default value of the parameter a, when that parameter is 
omitted. 

There is a direct correspondence between each relation and a recursive procedure in the implementa- 
tion code, and between each rule and a case branch in the implementation code. 

To preserve all the internal invariants, we assumed that type and term are the top-level algorithms, and 
that they are started with an empty environment. Typechecking failure is (implicitly) represented as a 
"stuck" condition of the transition system. 

What follows is the now well known sound and complete algorithm for F-sub [Curien, Ghelli 1991]. 
Giorgio Ghelli has shown that this algorithm diverges in some situations where it should fail, and Benjamin 
Pierce has further shown that the type system of Appendix D is undecidable [Pierce 1992]. 



Lift increases indices above a cutoff index. 



type : Type 



lift 



by : Int, cutoff : Int( =0) 



>result : Type 



lift , ^ . N lift . N lift 

n — >n(n<j) n — >n + i (n > j) Top — t>Top 
i,j i,j i,j 
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lift 

A—>A ' 

hj 



lift 

B — >B ' 

i,j + l 



lift 
A — >A ' 

hj 



lift 

B — >B ' 

i, j + 1 



lift 

A->B—>A'->B' 



hj 



All (<: A)B — >A11 (<:A')B' 

hj 



Replace performs a substitution and lowers the free indices. 



„, replace , _ 

type : Type >result : Type 

index : Int, with : Type, lift : Int( =0) 



lift 

replace / replace replace 



n on (n < i) ; n >n -1 (n> i) Top 

replace 

n,C,l 



i,C,l „ re P lace ^, i,Cl i,C,l 

n oc 



replace replace replace replace 

A — >A' B >B' A — — >A ' B >B 

i,C,l i + l, C./ + 1 i,C,l i + l,C,l + \ 



replace replace 

A- > B — >A '—> B ' All (< : A)B— >A11 (<: A ')B ' 

i, C, I i, C, I 



Typelde extracts the bound of a type identifier from an environment. 



env : Env - 



typelde 



index : Int, depth : Int(= index) 



presult : Type 



lift 

A— >A ' 
d 



„ typelde 
E,<:A— >A ' 



l,d 



^ typelde 

E— >B 

n, d 

„ typelde 

E, <: A — >B 

n + l,d 



^ typelde 

E— >B 

r^d 

typelde 

E,:A— >B 

n + l,d 



Termlde extracts the type of a type identifier from an environment. 



env : Env - 



termlde 



index : Int, depth : Int(= index) 



presult : Type 



lift 
A— >A ' 

d 



termlde 

E,:A >A 1 

l,d 



termlde 

E >B 

r^d 

termlde 

E, <: A >B 

n + X d 



^ termlde 

E >B 

r^d 

termlde 

E,:A >B 

n + Xd 



ExposeArrow strips type variables until it finds an arrow type. 



„ exposeArrow _ „ 

type : Type >outType : Type 

env : Env 



ExposeAll strips type variables until it finds a forall type 



expose All 

type : Type >outType : Type 

env : Env 



^ typelde 

E— \>A 



exposeAll 

A — >A ' 



exposeAll 

n— >A ' 



exposeAll 
All (<: A)B — >A11 (<:A)B 



Sub tests subtyping between two types. 



small, big : Type — — >result : {ok} 
env : Env 



„ typelde sub 

E— >A A, B ook 

sub , sub , n F 
A, Top ook n,n ook r (B ^ n, B ^ Top) 

£ c n,B ook 

E 

Sub , , Sm£> , 

A', A ook B, B ' >ok A', A ook B, B ' ook 

£ A ' £ £, < : A ' 

sm£> , sub , 

A- > B, A '-> B ' >ok All (< : A)B, All (<: A ')B ' ook 

E E 



Type checks that a pretype is well-formed and returns the corresponding type. 



pre : PreType ^result : Type 

env:Env(=0) 



^ typelde 

E— t>A 



n 



type 
E 



type 
Top^— >Top 



S^— OA 



fyne 
T >B 
A 



S^— OA 



type 
T^i- — OB 
E,<:A 



type 

S->T OA- > B 



type 

All (< : S)T^— >A11 (< : A)B 
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Term checks that a preterm is well- typed and returns its type. 



pre : PreTerm 



term 



env : Env(=0) 



\>result : Type 



termlde 

E >A 



term 

n >A 



term 

top— ^>Top 



type 
E 



term 

b >B 

E,:A 



type 



term 

b >B 

E,<:A 



term 

fun ( : S)b >A- > B 



term 

fun (<:S)b >A11 ( < : A)B 



term exposeArrow term sub , 

b t>C C— >A'->B a >A A, A' >ok 

EE EE 



replace 

B — >B 1 

I, A 



term 

b(a) >B ' 

E 



term 

b >C 



expose All 
C— >A11 (<: A')B 



type 



sub , 

A, A ' t>ok 

E 



replace 

B — >B ' 

I, A 



term 

b( :S) >B' 

E 
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Appendix F. Typing algorithm with argument synthesis 

We now extend the typing algorithm of Appendix E with type inference. The inference mechanism is 
based syntactically on [Pollack 1990], and algorithmically on [Miller (to appear)]. 
The necessary data structures are as follows, where q=? or qis empty. 

PreType 

S,T = n I Top I S->T I All (q< : S) T 
PreTerm 

a,b = n\n \ I top I fun ( : S) b\ b(a) I fun (g< :S)b\b(:S) 

Env 

E = 01 E, <:Al E, :A 
Type 

A, B = CC\ n I Top I A->B I All (q< :A) B 
Subst 

a = 0\cc r ,o \ a<-A, a I eft I all 

A substitution a binds unification variables that may occur in terms, types and environments. An in- 
stantiated variable appears as a<—A in the substitution. A non-instatiated variable appears as a r in the sub- 
stitution, where the rank r is an index into an environment. Rank 1 points to the right of the rightmost com- 
ponent of the environment; rank 2 points between the rightmost component and the one to its left, and so 
on. This information encodes a mixed prefix [Miller (to appear)]: universally quantified variables are repre- 
sented by de Bruijn indices into the environments, while existentially quantified (unification) variables 
have ranks pointing between components of the environment. (Therefore, the order of universal quantifiers 
matters, but the order of contiguous existential quantifiers does not.) 

The operations alt and oU- shift all the (free) de Bruijn indices and ranks in a by +1 or -1. 

Before describing the algorithm, we give some properties of substitutions. Here is how the substitution 
shifts alt and all are normalized away, and how a normalized substitution a is applied to a type A (via 
A{a}[]). The order of occurrence of variables in a normalized substitution is not important. 

aft = at , ; o$ = o\ A ; 0t ; . = 0; (a <- A,a)t . - a <- aT ; ,at . ; (a r ,a)t . = a r+i ,at . 

At . = At" ; at/ = a ; n\\ = n (n< j) ; nV t = n + i (n> j) ; Topt{ = Top 
(A- > B)V, = AV- > BV; 1 ; (All (q < : A)B)t 7 ; = All (q < : aV,)bV, + 1 

aia^aY; = a; a{a <- A,a}\ = A{a}° ; n{a}/ = n (n<j); n{a}/ = n + i (n>j) 
Top{a}/ = Top ; (A- > B){a}/ = A{a}/- > s{a}/ +1 ; (All (q < : A)B){o}{ = All (q < : Airf^Bicr}^ 1 

A substitution is applied to a judgment, for example a typing judgment, as follows. Indices decrease while 
moving into the environment, but all indices remain positive because of rank restrictions. 

(Eh A type) {a}/ = E{a}{ h A{a}\ type 

0{a}/=0; (£,:A){ff}/=£{a}/. 1 ,:A{a}/. 1 ; (E,< : A){a}/ = E{a\U,< : A\a)U 
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Here is how an index- 1 -replacement operation A{1 <— C} is applied to a type A containing unification vari- 
ables. This operation occurs in the context of some substitution o; the case a{i C} = a is justified by 
rank restrictions that ensure that whatever a is instantiated to, it cannot depend on i. 

a{i <-C} = a; n{; <- C} = n (n< i) ; n{n <- C} = ct 
n{i <- C} = n - 1 (n > ;') ; Top{; <- C} = Top 

(A- > B){; <- C} = A{; 4- C}- > B{i + 1 4- C} ; (All (q < : A)B){i <- C} = All (q < : A{; <- C})B{; + 1 <- C} 
The notation cAa indicates removing a<-A or a r from a. 

Lift increases indices above a cutoff index. 



type : Type 



lift 



by : Int, cutoff : Int( =0) 



\>result : Type 



lift lift , lift , , . N lift 

a— MX, n—t>n(n<j) n—>n + \(n>j) Top— >Top 
i,j i,j i,j i,j 



lift lift 

A— >A ' B — - — >B ' 



i, J 



i,j + \ 



lift 

A->B— >A'-> B' 



lift 

A— >A ' 

u 



i, 7 + 1 



>B ' 



lift 

All (q < : A)B— >A11 (q <: A ')B ' 

ii 



Retrieve retrieves the instance or rank of a type variable from a substitution. 



subst : Subst - 



retrieve 



lift : Int( = 0), var : TypeVar 



^instance : Type U rank : Int 



lift 
A— >A ' 
i 



retrieve 

a <r- A, a >A ' 

i,a 



retrieve 

a >A 



_ retrieve 
p <— B,a >A 

i,a 



retrieve 

a r ,a >r + 1 

i,a 



retrieve 

a >a 

i,a 

_ retrieve 
i,a 



retrieve 
G >A 

i + l,a 

a retrieve 

a\\ >a 

i,a 



retrieve 

a >a 

i-l,a 

ii retrieve 
CW >A 

i,a 



Replace performs a substitution with an index shift of -1 on "free indices". A type variable shall not depend 
on the replacement index. 



type : Type >result : Type 

index : Int, with : Type 
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lift 

c - >C 

replace replace , t n . i replace , , „ replace 

a— >a n— >n (n< i) n— >n - 1 (n > i) Top— >Top 

i,C i,C n r6P tC i>C i,C 

n,C 



replace replace replace replace 

A — >A' B — >B' A — >A' B — >B ' 

i,C i + lC i,C i + lC 



replace replace 

A- > B — >A '—> B ' All (g < : A)B — >A11 (q < : A )B 

i, C i, C 



Strip expands question-mark quantifiers and introduces ranked variables in substitutions. 



strip 

type : Type \>outType : Type, outSubst : Subst 

subst : Subst 



retrieve strip , retrieve 

a >a a >A',a a \>r 

a a a stn P 



strip , strip 

a — —>A',a a — -xx,o 

a a 



A ->A,(J (A ^ a, A ^ All (? <:C)B) 



replace strip replace strip 

B >B ' B >B",a B — >B ' B — >B " (7 

la a,,a , l a a , , 

(anew in a) : (A ^ Top) 



strip , strip 

All (? <:Top)B ->B",(7 All ( ? < : A)B ->B",<T 



Typelde extracts the bound of a type identifier from an environment. 



env : Env - 



typelde 



index : Int, depth : Int(= index) 



^result : Type 



lift 
A— >A ' 
d 



_ typelde 
E,<:A— >A ' 



1,< 



^ typelde 

E— >B 

n,d 

„ typelde 

E, <: A — >B 

n + l,d 



„ typelde 

E^ >B 

n,d 

„ typelde 

E,:A— >B 

n + l,d 



Termlde extracts the type of a type identifier from an environment. 



env : Env - 



termlde 



index : Int, depth : Int(= index) 



>result : Type 



lift 

A— >A 
d 



„ termlde 

E,:A >A ' 

l,d 



^ termlde 

E >B 

r^d 

^ termlde 

E, <: A >B 

n + l,d 



^ termlde 

E >B 

r^d 

termlde 

E,:A >B 

n + l,d 
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ExposeArrow strips variables until it can find or generate an arrow type. 



„ exposeArrow 

type : Type \>outType : Type, outSubst : Subst 

env : Env, subst : Subst 



^ typelde exposeArrow 

E— >A A — >A',a' 

n E a exposeArrow 

-. — A— > B — >A- > B, a 

exposeArrow , p a 

n— >A',a' '° 

E,a 

retrieve exposeArrow 
O >A A >A',G 

a Ea_ 

exposeArrow 

a— >A \a 

E,a 



retrieve 

G >r 

a 



exposeArrow „ „ 

a— Ha ->a ' ),(a<-(a ->a ),a r ,a r ,(a \ a)) 

E,a 



(a', a" new in a) 



ExposeForall strips variables until it can find a forall type or can generate a (non- ?) forall type. 



exposeAll 

type : Type >outType : Type, outSubst : Subst 

env : Env, subst : Subst 



^ typelde exposeAll 

E— >A A — >A',a' 

n E a exposeAll 

, „ ' All (q<:A)B— >A11 (q < : A)B, <J 

exposeAll , p r> 

E,a 

retrieve exposeAll 

a >a a — >A',a 

a E,a 



exposeAll 

a— >A 

E,a 



retrieve 

a \>r 

a 



exposeAll 

a— c<All (< :a')a"), (a <- (All (<-.a')a"),a' r ,a" r ,(a \ a)) 

E,a 



(a', a" new in a) 



OccurCheck tests for circular instantiations and rank violations (variable captures). 



occurCheck , , 

type : Type ^result : Subst 

var : TypeVar, varRank : Int, subst : Subst, level : Int( = 0) 



retrieve 
a >s 

P 



occurCheck 

P —>P r >(° \ P) 

a, r, a, i 

retrieve 



retrieve 
a >s 

(«*A*<r) occurCheck (^P,s>r) 

P 1X7 



P 



occurCheck 
>B B 1X7 

a, r, a, i 



_ occurCheck 
P 1X7 

a, r, a, i 



a, r, a, i 



, „, occurCheck , , . occurCheck 
(a^p) n ; — cx7 (-i(n > i a n < r)) Top X7 



a, r, a, i 



a, r, a, i 



occurCheck , occurCheck 

A XT' B s X7" 

a, r, a, i a, r + \,o'\\,i + 1 

occurCheck „ M 

A— > B XT ii 

a, r, a, i 



occurCheck , occurCheck 

A XT' B s X7" 

a, r, a, i a,r + \,a'\\,i + \ 

occurCheck „ » 
All (g < : A)B 1X7 

a, r, a, i 



Sub tests subtyping between two types. 



small, big : Type 



sub 



env : Env, substln : Subst 



->substOut : Subst 



sub sub 

A,Top 1X7 a,a ix7 

E, a E,a 



retrieve sub 
O >A A, B X7 

a E,a 

sub 

a,B xj 

E,a 



retrieve sub 
O >B A, B 1X7 

(B*a, B^Top) £ ^ (A?P) 

A,B- >E,G 

E,a 



retrieve occurCheck retrieve occurCheck 

a or B X7' C7 t>r A X7 

a a,r,<7 , , P P,r,CT , , 

- - (Bt^B, B ^ Top) ^ ^ ^ (A?P) 



a,B xx <- b, (a \a) 

E,G 



A,P—>P<-A,(<T , \p) 

E,a 



^ typelde sub 

E— >A A,B X7' 

su " n E, a , / \ 

n, n 1X7 r (B ^ 7, B ^ n, B ^ Top) 

£ " tT n,B XT' 

E,a 



sub , iw/j „ sub , sub 

A', A 1X7 B,B' K >a A', A XT B, B ' X >(T" 

E,a E,: A', <7'1t E,a E,<: A', <7'1T 



A- > B, A '- > B ' WJ'M 

E,a 



sub 

All (g <:A)B,A11 (g < : A ')B ' X7" 

E,a 
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Type checks that a pretype is well-formed and returns the corresponding type. 



pre : PreType ^result : Type 

env:Env(=0) 



typelde 

E— \>A 



type 



type 
Top^— >Top 



type 
E 



type 
E,:A 



type 



type 
E,< : A 



type 

S->T -^—>A- > B 



type 

All (q <:S)T^— >A11 (q <:A)B 



Term checks that a preterm is well- typed and returns its type. 



term 

pre : PreTerm >type : Type, substOut : Subst 

env : Env( =0 ), substln : Subst 



„ termlde 

E >A 



strip 

A — ->a ',a 
E,a 



term 
n >A ',(7 



E,a 



„ termlde 

E >A 



n 



term 
n\ >A,a 



E,a 



term 

top >Top, a 

E,a 



type 
E 



term 

b -K>B,a 

E,:A,aV 



term , , 

fun ( :S)b OA- > B,a'\ 

E,a 



type 



term 

b 

E,<:A, crlT 



term , i 

fun (q <:S)b >A11 (q <:A)B,a'\ 

E,a 



term , exposeArrow 

b >C,a C— >a'->b,o 

E,a a' 



term sub , replace 
a \>A,p A, A' >p B — >B 1 

E, a" E,p 1, A 



term 

b(a) >B \p' 

E,o 



term 

b >C,o' 

E,o 



exposeAll type 
C— >A11 (q < : A ')B, p S -^—>A 



sub 

A, A ' >p 

Ep 



replace 

B — — >B ' 

1, A 



term 

b( :S) >B \p' 

E,a 
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Appendix G. Recursion 

As explained in section 7, two recursive types are equal only if the Rec bindings occur in the same po- 
sitions. That is, unfolding a recursion does not produce, in general, an identical type. Since variables are re- 
placed by de Bruijn indices, equality of recursive types is then simple identity. 

Still, precisely because of de Bruijn indices, the subtyping test for recursive types is not trivial. The 
formal subtyping rule requires that the body of two recursive types be tested for inclusion under the as- 
sumption that the corresponding variables are included in one direction. But since the de Bruijn indices are 
identical in both types, they will match in both directions. For example, Rec (X) X->Bool <: 
Rec ( Y) Y->Top should fail, while the de Bruijn version Rec ( ) l->Bool < : Rec ( ) l->Top would, 
naively, succeed. Hence, before testing the bodies we compute the ties between the recursion variables, 
which can be positive (covariant), negative (contravariant), or both. If the ties are only positive, we test the 
bodies for inclusion, otherwise we test the bodies for equality (inclusion in both directions). The ties are 
computed by mimicking a subtype test. 

G.l. Typing Rules 

Contractivity Relation ( A> X) 

Y>X^>Y^X; Top>X; (A->B)>X; (All (Y < : A)B)>X 
(Rec {Y)B)>X <^> B>X A B>Y AY ^X 



Types 



(Type Rec) 

E,X <: Top \- B type, 

— - — (B>X) 

E h Rec (X)Btype 



Subtypes 



(Sub Rec) 

E h Rec (X)Btype E h Rec (Y)C type E, Y < : Top, X < : Y h B < : C 
E h Rec (X)B < : Rec (Y)C 



Terms 



(Term fold) (Term unfold) 

E h a : B{X <r- Rec (X)B} £ h a : Rec (X)B 



E h fold (: Rec (X)B) (a) : Rec (X)B E h unfold (a) : B{X <- Rec (X)B) 

(Term rec) 

E, x : A h a : A 



E h rec (x : A) a : A 
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This is now the de Bruijn-index version. 



(Rec ( )B)V j = Rec ( )stf 1 (Rec ( )B){i <- C} = Rec ( )B{i + 1 <- C} 

A><=>A>1; n>m^>n^m; Top>n ; (A->B)>n; {All {<: A)B)>n 
(Rec ( )B)>n <^> B>n + 1 a B>1 



Types 



(Type Rec) 

E,<: Top h B type , 

— — (B>) 

E h Rec ()B type 



f Sm£> Re cj 

£,<:Top,<:l h fit 1 . < : ct° , 



E h Rec ()B < : Rec ( )C 



Terms 



(Term fold) (Term unfold) 

E\- a:B{l<r-Rec()B] E\~a:Rec()B 



E h fold (: Rec ( )B) (a) : Rec()B £ h unfold (a) : B{l<HRec()B} 

(Term rec) 
E- A h a: At, 



£ h rec ( : A)a : A 
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G.2. Typing algorithm 



PreType 

S, T = ... | Rec () T 

PreTerm 

a,b = ... I fold(:T)(a) I unfold(a) | rec(:S)b 
Type 

A, B = ... | Rec ()B 

/z/*f replace type contracts , 

B — >B' B >B' T >B B >ok 

i, 7+1 i + \C,l + \ £,<:Top 



lift replace type 

Rec ()B— >Rec ( )B ' Rec ( )B — >Rec()B' Rec ( )T^— >Rec ( )B 

i, j i, C, I E 



ties „ sub 

B,C pC, b,C >ok 

£,<:Top £,<:Top .„ . 

sub (Cefpos}) 

Rec ()B,Rec ( )C >ok 

E 

ties „ , sm£> 

B,C tZ, B,C >ok C,B >ok 

E,<:1op £,<:Top £,<:Top .„ . 

^ (fefpos}) 

Rec ( )B,Rec ( )C >ok 

type term replace sub , 

T^— >Rec()B a >A B - >B ' A, B ' >ok 

E E 1, Rec ( )B £ 

fold(:T) (a) >Rec()B 

term exposeRec replace type term lift sub 

a >A A — >Rec()B B >B ' S -^—>A a >A A—t>A' B, A ' >ok 

E E l,Rec()B E E,:A 1 E,: A 



term term 

unfold (a) >B ' rec(:S)(a) >A 

E E 



Contracts tests whether a type is formally contractive in a variable. 



contracts 

type : Type Pre suit : {ok} 

index : Int( = 1) 



contracts , , contracts , contracts , contracts , 

n >ok (n^m) Top >ok A- > B >ok All (<:A)B >ok 

m mm m 
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contracts , contracts , 

B >ok B >ok 

1 m + 1 

contracts , 

Rec ( )B cok 

m 



ExposeRec strips type variables until it finds a recursive type. 



exposeRec 

type : Type >outType : Type 

env : Env 



^ typelde exposeRec 

E— >A A — >A' 

n 77 exposeRec 

- 5 Rec ( )B — >Rec()B 

exposeRec p 
n— >A' 



Ties computes the subtyping constraints between two recursive variables, by mimicking sub. 



small, big : Type 



ties 



env : Env, index : Int( = 1), variance : {pos, neg }( = pos ) 



>result : !P{pos,neg} 



typelde ties „ 

E— >A A,B 

ties , , ties , , ties n EA,v , , 

A, Top >{} n, n >{v} n,n >{} (1 ^ n) : — (B ^ n, B ^ Top) 

E,i,v ' E,n,v E,i,v n B ties ^ 

E, i, v 



A', A xZ B, B ' >C 

E,i,-w E,:A',i + l,v 



ties 



A— > B, A ' — > B ' Dfuf 

E, i, v 



ties „ 

A', A PC, B,B' 

E, i, —iv 



ties 



E,<: A',i + l,v 



f;e.y 



All [<:A)B, All (< : A ')B ' — U £' 

£, i, v 



B,B' C< B, B ' 

E, 1, pos 



f/es 



£,< : Top,;' + 1, v 



Rec ()B,Rec ( )B ' >£' 

E, i, v 



C efposj 
v C '={} 



B, B ' c< B, B 

£, 1, pos 



£,< : Top,;' + 1, v 



Rec ()B,Rec ()B '- 



ties 
E, i, v 



>{ pos, neg} 
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G.3. Typing algorithm with argument synthesis 



contracts , contracts , 

a ook Al 1 (q < : A)B ook 

m m 

ties „ ties 

A', A >i B,B' o£ 

ties ties E,i,-iv E,< : A ' ,i + l,v 

a,B— — o{} A,p— — H}(A*y) ^ ! s 

/i ''' V /V ' V All (q < : A)B, All (cf < : A ')B ' >£ U £' 

£, i, v 

///f , replace occurCheck 

B — >B' B — — >B' B s XT 

i, J + 1 i + 1, C a,r + \, a\\, i + 1 



//ft replace occurCheck ,« 

Rec ( )B— >Rec ( )B ' Rec()B — 0Rec()B' Rec()B 1X7 Ml 

/, j i, C a, r, a, i 



type contracts , ties „ sub 

T >B B Ook B,C o£ B,C ^WT 

£,<:Top £,<:Top £,<: Top, <T¥ /f , , 

We ^uY— n (££{P°s}) 

Rec ()T^— >Rec ( )B Rec ( )B,Rec ( )C tXT'W 

£ £,C7 

f/es „ iw/j sm/? 

B,C 06 B,C t-«7' C,B 0(7" 

£,<:Top £,<:Top,0lf £,<:Top,(7' ,„ , 

~ sub „„ " ( ^ { P° S}) 

Rec ( )B, Rec ( )C WJ"4 



fype ferm , replace sub 

T^— >Rec()B a >A,<r' B >B ' A, B ' 0(7" 

£ Ea_ l,Rec ()B E,a' 

term 

fold(:T) (a) 0Rec()B,(7" 

£(7 



ferm , exposeRec „ replace 

a OA, (7 A — oRec()B,(7 B i >B ' 

E,a E,a' 1, Rec ()B 

tevwi 

unfold (a) >B \o" 

Eg 



type term , lift sub 

S^— OA a a>B,<7 A— OA ' B,A' 1X7" 

E £: A,aV 1 E,:A,a' 

term „ M 

rec ( : S) (a) >A,o"Q 

E,a 
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„, exposeRec _ _ _ , _ , 

type : Type tvutType : Type, outSubst : Subst 

env : Env, subst : Subst 



„ typelde exposeRec 
E— >A A — >A',a' 



Ea exposeRec 

Rec()B— >Rec()B,(7 



n exposeRec ^, & E ,o 
Ea 



retrieve exposeRec 

a >A A — >A',a 

a E,a 



exposeRec 

a— >a ',a 

E,a 



retrieve 

a >r 

a 



exposeRec 

a— c<Rec ( )a'),(a <- (Rec ()a'),a' r ,{a \ a)) 

E,a 



(a' new in a) 
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